GLMM Maternal Mortality Federally- Summer 2025

This is a Report Template

Author

Carolyn Herrera & Catherine Funte (Advisor: Dr. Cohen)

Published

July 30, 2025

Slides: slides.html ( Go to slides.qmd to edit)

Important

Remember: Your goal is to make your audience understand and care about your findings. By crafting a compelling story, you can effectively communicate the value of your data science project.

Carefully read this template since it has instructions and tips to writing!

Introduction

Generalized Linear Mixed Models (GLMMs) are a flexible class of statistical models that combine the features of two powerful tools: Generalized Linear Models (GLMs) and Mixed-Effects Models(Agresti 2015). Like GLMs, GLMMs can model non-normal outcome variables, such as binary, count, or proportion data. However, they go a step further by incorporating random effects, which account for variation due to grouping or clustering in the data, correlated observations, and overdispersion. The GLMM method can work with complex data structures such as temporal correlation, spatial correlation , nested data, heterogeneity of variance, and repeated measurement(Zuur et al. 2009).

In practical terms, GLMMs are especially useful when data points are not independent, such as when students are nested within schools, patients are treated within hospitals, or repeated measures are taken from the same subject over time. For example, Thall wrote that issues with longitudinal clinical trial basic count data from repeated measures taken from the same subject over time will have problems detecting comparable between subject outcomes because it can be difficult to determine if outcomes are time dependent or due to treatment groups, thus a general linear mixed model method may be utilized to represent dependence upon each patient, incorporate covariate data, create time as a function, account for variability between patients,and be flexible and tractable (Thall 1988). The random effects help model the correlation within clusters and allow for unobserved heterogeneity—differences that are not captured by the measured covariates.

GLMMs are good for:

  • Handling hierarchical or grouped data (e.g., students within classrooms, patients within clinics)(Lee and Nelder 1996)

  • Modeling non-normal outcomes, such as:

  • Improving inference by accounting for both fixed effects (predictors of interest) and random effects (random variation across groups)

  • Reducing bias and inflated Type I error rates that can result from ignoring data structure(Thompson et al. 2022)

GLMMs are ideal when your data is both complex in structure and involves non-Gaussian response variables, making them indispensable in fields like medicine, ecology, education, and social sciences. Tawiah et al describes zero-inflated Poisson GLMMs, an extension of Poisson GLMM that allows for overdispersion due to a prevalence of zeros in the data, common in health sector data(Tawiah, Iddi, and Lotsi 2020). The paper compares a Poisson GLM, a zero-inflated Poisson GLM, a Poisson GLMM, and a zero-inflated Poisson GLMM, applied to clustered maternal mortality data. Another paper by Owili et al utilizes a GLMM to investigate the impact of particulate matter on maternal and infant mortality globally (Owili et al. 2020). They use a Poisson link function and take year and country as random effects to account for differences in global data.

We wish to analyze the data of federal maternal mortality deaths via VSRR Provisional Maternal Death Counts and Rates dataset by utilizing a General linear mixed model with Poisson link as it is count data. We wish to see if ethnicity (a fixed effect) has any influence upon maternal death per live births(number of live births per 12 month period or year) count by year(random effect). Like other public health or clinical data there will be issues such as correlated observations and overdispersion but GLMM will be utilized to parse through the noise and determine if indeed there are some patterns of maternal mortality among mothers of differing ethnic ties.

Methods

Math Background

GLMMs can be considered an extension of GLMs, wherein a GLM includes the addition of random effects, or an extension of Linear Mixed Models (LMMs), where a linear model with fixed and random effects is extended for non-normal distributions(Salinas Ruı́z et al. 2023). Let

  • \(\mathbf{y}\) be a \(Nx1\) column vector outcome variable

  • \(\mathbf{X}\) be a \(Nxp\) matrix for the \(p\) predictor variables

  • \(\boldsymbol{\beta}\) be a \(px1\) column vector of the fixed effects coefficients

  • \(\mathbf{Z}\) is a \(Nxq\) matrix of the \(q\) random effects

  • \(\mathbf{u}\) is a \(qx1\) vector of random effects, and

  • \(\boldsymbol{\epsilon}\) is a \(Nx1\) column vector of the residuals

Then the general equation for the model is given by:

\[\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\mathbf{Z}{u}+\boldsymbol{\epsilon}\] (Salinas Ruı́z et al. 2023). The GLMM Model process is that the analyis of variance model or the equation is a vector of linear predictors with of unknown parameters estimates. Each distribution has is its own probability function which we will utilize the Negative Binomial as GLMMs typically include a link function that relates the response variable \(\mathbf{y}\) to a linear predictor, \(\eta\), which excludes the residuals. So then \[\boldsymbol{\eta}=\mathbf{X}\boldsymbol{\beta}+\mathbf{Z}\boldsymbol{\lambda}\] The link function is \(g(\cdot)\), where \[g(E(\mathbf{y}))=\boldsymbol{\eta}\] where \(E(\mathbf{y})\) is the expectation of . The choice of link function depends on the outcome distribution. For this paper our data demonstrates a Negative Binomial distribution for overdispered count data, so we will use a log link function.
\[g(\cdot)=log_e(\cdot)\] In the GLMM model the parameter estimates is solved by reducing the negative log likelihood functions(Salinas Ruı́z et al. 2023). The means or the least square means are derivative of the parameter estimates and are found on the model scale. The link function, negative binomial log link, will convert the mean estimates at the model scale to the data scale. Negative Binomial Distribution: \[ f(y;k,{\mu})=\frac{\Gamma(y+k)}{\Gamma(k)*(y+1)}\left(\frac{k}{\mu+k}\right)^{k}\left(1-\frac{k}{\mu+k}\right)^{y} \] Zuur writes that the negative binomial Distribution has two parameters \({\mu}\) and \(k\) (Zuur et al. 2009). The symbol \({\Gamma}\) is defined as \({\Gamma(y+1)=(y+1)!}\) The Mean of Negative Binomial is given: \(E(Y)= {\mu}\) The Variance of Negative binomial is given; \(Var(Y)= {\mu}+ \left(\frac{\mu^2}{k}\right)\), where second term determines the overdispersion, \(k\) is called the dispersion parameter and indirectly determines overdispersion. If \(k\) is significantly large relative to \({\mu^2}\) then the second term will approximate to zero and a Poisson distribution may as well be used. However, the smaller the \(k\) value the larger the overdispersion may form and then negative binomial is the correct log link to utilize.

GLMM model process is

Assumptions

Before deciding to use a GLMM for our data, we had to check some assumptions (specific to our negative binomial distributed data).

  • The response variable and the predictors have a linear relationship within the levels of random effects.
  • The response variable is assumed to follow a negative binomial distribution, with \(\sigma^2>\mu\).
  • The residuals and random effects are independent.
  • The random effects are assumed to be normally distributed, with mean 0 and variance \(\sigma\).

Why Chosen for

The Negative Binomial distribution is ideal for overdispersed count data in a fashion the poisson distribution cannot accommodate due Negative Binomial distribution’s quadratic nature of mean-variance relationship. Overdispersion is a given for populations due to their heterogeneity that aggregation processes such as community clusters form with shared traits such as ethnicity. Our data includes community cluster groups such as ethnicity and age groups. The negative binomial model is desirable also due to its easily intrepretable dispersion parameter as a measure of aggregation and its tractability due to closed form expression of probability mass function which helps in convenient model inference and estimation. Such as our data is longitudinal a Negative Binomial GLMM is ideal since measurements are taken over time. We incoporate Year as a random effect because it could explain for variation in our model which could not be explained by our fixed effects such as ethnicity or age groups.

WE shall analyze by using R(R Core Team 2025) and the packages lme4{Bates et al. (2015)} with function glmer to investigate poisson model or glmer.nb() for negative binomial model and the package glmmTMB{McGillycuddy et al. (2025)} with the function, nbinom2(), for negative binomial model. The approximate parameterestimation method uses for glmmTMB uses AGQ or Laplace approximation with Wald Z significance test. The approximate parameter estimation method for lme4 default is Laplace approximation when no quadrature point is used but is pseudo-likelihood method in linearization,and integral approximation the and another estimation method is Gauss-Hermite quadrature to approximate the log-likelihood using numerical integration(Salinas Ruı́z et al. 2023).

Densities are seen as counts per volume and can be modeled with NB and offset variable(Zuur et al. 2009). We are investigation maternal deaths per live births and while we can use the variable rate, it would be less noisy if we used the offset variable log(Live Births). We can also model without the offset and utilize Akaike information criterion,AIC, to see which is the better model as the smaller the AIC would indicate a better model. A random effect could be month,year, or month/year. Fixed effects could be age groups, ethnicity, and perhaps year. Our choice of random or fixed effects are dependent on the question we which to investigate.

Analysis and Results

Data Exploration and Visualization

The data we used come from the National Vital Statistics System, an organization that provides national data about births, deaths, marriages, divorces, and fetal deaths. It is a collaboration between the National Center for Health Statistics (NCHS) (a part of the CDC) and state vital records offices. {create citation https://www.cdc.gov/nchs/nvss/about_nvss.htm} This dataset is the Vital Statistics Rapid Release (VSRR) Provisional Maternal Death Counts and Rates, in the form of a .csv file. The dataset contains monthly death counts and death rates from 2019 to 2024 by race and ethnicity, age and overall. Rates are maternal deaths per 100,000 live birthd. For this dataset, a maternal death is defined as “the death of a woman while pregnant or within 42 days of termination of pregnancy irrespective of the duration and the site of the pregnancy, from any cause related to or aggravated by the pregnancy or its management, but not from accidental or incidental causes.” {create citation https://www.cdc.gov/nchs/nvss/vsrr/provisional-maternal-deaths-rates.htm}

  • Present initial findings and insights through visualizations.

  • Highlight unexpected patterns or anomalies.

Code
# loading packages 
library(tidyverse)
library(knitr)
library(ggthemes)
library(ggrepel)
library(dslabs)
library(ggplot2)
library(patchwork) # For combining plots
library(ggpubr)
library(DataExplorer) #For EDA
options(repos = c(CRAN = "https://cloud.r-project.org"))#specify cran mirror
if (!require("lme4")) install.packages("lme4")#if required to install
library(lme4)
##to check if model is overdispersed'
if (!require("performance")) install.packages("performance")#if required to install
library(performance)
##Overdispersion is detected with Poisson model thus we must use negative binomial model
if (!require("glmmTMB")) install.packages("glmmTMB")#if required to install
library(glmmTMB)
Code
df <- read.csv("data/VSRR_Provisional_Maternal_Death_Counts_and_Rates_20250726.csv")#loading data
#variables- Maternal.Deaths--outcome variable
#Live.Births---offset variable since it is maternal deaths per live births
#Subgroup- ethnicity and year of mother must be edited --- these are fixed
#Year.of.Death---random effect variable
#Summary
#renaming columns
df<-df%>% 
  rename(Year= Year.of.Death, Month= Month.of.Death, Maternal_Deaths= Maternal.Deaths, Live_Births= Live.Births, Maternal_Mortality_Rate= Maternal.Mortality.Rate, Time_Period= Time.Period, Month_Ending_Date= Month.Ending.Date, Data_As_Of= Data.As.Of) 
  
filtered_df<- df%>% filter(Maternal_Deaths > 0, Live_Births > 100) #Filtering out rare combinations
  
test_df <- na.omit(filtered_df)#filtered na
Code
#Summary Statistics
deaths_df <-test_df %>%
  mutate(
    Ethnicity = case_when(
      Group == "Race and Hispanic origin" ~ Subgroup,
      TRUE ~ NA_character_
    ),
    Age_Group = case_when(
      Group == "Age" ~ Subgroup,
      TRUE ~ NA_character_
    ),
    Is_Total = Subgroup == "Total"
  ,
  Year = as.factor(Year),
  Month= as.factor(Month),
  Ethnicity=as.factor(Ethnicity),
  Age_Group=as.factor(Age_Group))#mutating groups

deaths_df_ethnic<-deaths_df %>%
  filter(!is.na(Ethnicity))###filter out is total if ethnicity or age group are na

deaths_df_age <-deaths_df %>%
  filter(!is.na(Age_Group))
merged_deaths_df <-  bind_rows(deaths_df_ethnic, deaths_df_age) %>%
  distinct() 
deaths_df2<-merged_deaths_df %>%
  filter(!(Is_Total & (is.na(Ethnicity) | is.na(Age_Group))))
##to get total deaths per year by ethnicity
  df_ethnicity_year <- deaths_df %>%
    filter(!is.na(Ethnicity), !is.na(Year)) %>%
    group_by(Year, Ethnicity) %>%
    summarise(Maternal_Deaths = sum(Maternal_Deaths, na.rm = TRUE))
  df_ethnicity_year
# A tibble: 26 × 3
# Groups:   Year [6]
   Year  Ethnicity                                      Maternal_Deaths
   <fct> <fct>                                                    <int>
 1 2019  Asian, Non-Hispanic                                        422
 2 2019  Black, Non-Hispanic                                       2717
 3 2019  Hispanic                                                  1193
 4 2019  White, Non-Hispanic                                       3842
 5 2020  Asian, Non-Hispanic                                        339
 6 2020  Black, Non-Hispanic                                       3148
 7 2020  Hispanic                                                  1587
 8 2020  White, Non-Hispanic                                       4100
 9 2021  American Indian or Alaska Native, Non-Hispanic             150
10 2021  Asian, Non-Hispanic                                        352
# ℹ 16 more rows
Code
df_age_year <- deaths_df %>%
  filter(!is.na(Age_Group), !is.na(Year)) %>%
  group_by(Year, Age_Group) %>%
  summarise(Maternal_Deaths = sum(Maternal_Deaths, na.rm = TRUE))
df_age_year
# A tibble: 18 × 3
# Groups:   Year [6]
   Year  Age_Group         Maternal_Deaths
   <fct> <fct>                       <int>
 1 2019  25-39 years                  6017
 2 2019  40 years and over            1139
 3 2019  Under 25 years               1320
 4 2020  25-39 years                  6824
 5 2020  40 years and over            1361
 6 2020  Under 25 years               1312
 7 2021  25-39 years                  8338
 8 2021  40 years and over            1975
 9 2021  Under 25 years               1558
10 2022  25-39 years                  9191
11 2022  40 years and over            2035
12 2022  Under 25 years               1796
13 2023  25-39 years                  6262
14 2023  40 years and over            1158
15 2023  Under 25 years               1134
16 2024  25-39 years                  5610
17 2024  40 years and over            1252
18 2024  Under 25 years               1386
Code
df_total_year <-deaths_df%>%
  filter(Group=="Total") %>%
  group_by(Year, Is_Total) %>%
  summarise(Maternal_Deaths = sum(Maternal_Deaths, na.rm = TRUE))
df_total_year
# A tibble: 6 × 3
# Groups:   Year [6]
  Year  Is_Total Maternal_Deaths
  <fct> <lgl>              <int>
1 2019  TRUE                8482
2 2020  TRUE                9503
3 2021  TRUE               11871
4 2022  TRUE               13022
5 2023  TRUE                8554
6 2024  TRUE                8248
Code
deaths_df2<-deaths_df %>%
  filter(!(Is_Total & (is.na(Ethnicity) | is.na(Age_Group))))###filter out is total if ethnicity or age when total is true
Code
plot_missing(deaths_df)

Code
#Data Exploration Graphs
colors=c('yellow','orange', 'blue', 'red', 'green')
df_ethnicity <- deaths_df %>%
    filter(!is.na(Ethnicity)) %>%
    group_by(Ethnicity) %>%
    summarise(Maternal_Deaths = sum(Maternal_Deaths, na.rm = TRUE))
barplot(df_ethnicity$Maternal_Deaths,names.arg=c('Am In/Ala', 'Asian, NH','Black, NH','Hispanic','White,NH'),main="Maternal Mortality Totals 2019-2024 by Ethnicity",col=colors,xlab='Ethnicity',ylab='Maternal Deaths')

Code
df_tot_year_ts<-ts(df_total_year$Maternal_Deaths, start = c(2019), frequency = 1)
plot.ts(df_tot_year_ts,main='Yearly Total Maternal Deaths 2019-2024',xlab='Year',ylab='Maternal Deaths',col='navy')

Code
df_tot_year_rate <- deaths_df%>%
  filter(Group=='Total')%>%
  select(Maternal_Mortality_Rate)
df_tot_year_rate_ts<-ts(df_tot_year_rate$Maternal_Mortality_Rate, start = c(2019,1), frequency = 12)
plot.ts(df_tot_year_rate_ts,main='Monthly Total Maternal Mortality Rate 2019-2024',xlab='Year',ylab='Maternal Deaths per 100,000 Live Births',col='navy')

Code
total_df <- deaths_df%>%
  filter(Group=='Total')
total_ts <- ts(total_df$Maternal_Deaths,start=c(2019,1),frequency=12)
plot.ts(total_ts,main='Monthly Total Maternal Deaths 2019-2024',ylab='Maternal Deaths',xlab='Time',col='navy')  

Code
His <- filtered_df%>%
  filter(Subgroup=='Hispanic')%>%
  select(Maternal_Deaths)%>%
  rename(Hispanic=Maternal_Deaths)
bl <- filtered_df%>%
  filter(Subgroup=='Black, Non-Hispanic')%>%
  select(Maternal_Deaths)%>%
  rename('Black_Non-Hispanic' = Maternal_Deaths)
wh <- filtered_df%>%
  filter(Subgroup=='White, Non-Hispanic')%>%
  select(Maternal_Deaths)%>%
  rename('White_Non-Hispanic'=Maternal_Deaths)
as <- filtered_df%>%
  filter(Subgroup=='Asian, Non-Hispanic')%>%
  select(Maternal_Deaths)%>%
  rename('Asian_Non-Hispanic'=Maternal_Deaths)
aina <- df%>%
  filter(Subgroup=='American Indian or Alaska Native, Non-Hispanic')%>%
  select(Maternal_Deaths)%>% 
  rename('American_Indian_or_Alaska_Native, Non-Hispanic'=Maternal_Deaths)
ethn_ts <- bind_cols(His,bl,wh,as,aina) 
ethn_ts <- ts(ethn_ts, start=c(2019,1), frequency = 12)
colors=c('red','blue','green','orange','yellow')
plot.ts(ethn_ts, plot.type='single',main='Monthly Maternal Deaths by Ethnicity 2019-2024', ylab='Maternal Deaths',xlab='Time',col=colors)
legend("topleft",                     
       legend = c("Hispanic", "Black, Non-Hispanic", "White, Non-Hispanic",'Asian, Non-Hispanic','American Indian, Alaska Native, Non-Hisp'),  # Labels
       col = colors,                   
       lty = 1,                        
       lwd = 2,                     
       bty = "n",
       cex = 0.8)                      

Code
HisR <- filtered_df%>%
  filter(Subgroup=='Hispanic')%>%
  select(Maternal_Mortality_Rate)%>%
  rename(Hispanic=Maternal_Mortality_Rate)
blR <- filtered_df%>%
  filter(Subgroup=='Black, Non-Hispanic')%>%
  select(Maternal_Mortality_Rate)%>%
  rename('Black_Non-Hispanic' = Maternal_Mortality_Rate)
whR <- filtered_df%>%
  filter(Subgroup=='White, Non-Hispanic')%>%
  select(Maternal_Mortality_Rate)%>%
  rename('White_Non-Hispanic'=Maternal_Mortality_Rate)
asR <- filtered_df%>%
  filter(Subgroup=='Asian, Non-Hispanic')%>%
  select(Maternal_Mortality_Rate)%>%
  rename('Asian_Non-Hispanic'=Maternal_Mortality_Rate)
ainaR <- df%>%
  filter(Subgroup=='American Indian or Alaska Native, Non-Hispanic')%>%
  select(Maternal_Mortality_Rate)%>% 
  rename('American_Indian_or_Alaska_Native, Non-Hispanic'=Maternal_Mortality_Rate)
ethnR_ts <- bind_cols(HisR,blR,whR,asR,ainaR) 
ethnR_ts <- ts(ethnR_ts, start=c(2019,1), frequency = 12)
colors=c('red','blue','green','orange','yellow')
plot.ts(ethnR_ts, plot.type='single',main='Monthly Maternal Mortality Rate by Ethnicity 2019-2024', ylab='Maternal Deaths per 100,000 Live Births',xlab='Time',col=colors)
legend("topleft",                     
       legend = c("Hispanic", "Black, Non-Hispanic", "White, Non-Hispanic",'Asian, Non-Hispanic','American Indian, Alaska Native, Non-Hisp'),  # Labels
       col = colors,                   
       lty = 1,                        
       lwd = 2,                     
       bty = "n",
       cex = 0.7)            

Code
ageu25 <- deaths_df%>%
  filter(Subgroup=='Under 25 years')%>%
  select(Maternal_Deaths)%>%
  rename('Under_25_years'=Maternal_Deaths)
age25 <- filtered_df%>%
  filter(Subgroup=='25-39 years')%>%
  select(Maternal_Deaths)%>%
  rename('25-39_years' = Maternal_Deaths)
ageo40 <- filtered_df%>%
  filter(Subgroup=='40 years and over')%>%
  select(Maternal_Deaths)%>%
  rename('40_years_and_over'=Maternal_Deaths)
age_ts <- bind_cols(ageu25,age25,ageo40) 
age_ts <- ts(age_ts, start=c(2019,1), frequency = 12)
colors=c('red','blue','green')
plot.ts(age_ts, plot.type='single',main='Monthly Maternal Deaths by Age Group 2019-2024', ylab='Maternal Deaths',xlab='Time',col=colors)
legend("topleft",                     
       legend = c("Under 25 Years", "25-39 Years", "40 Years and over"),  # Labels
       col = colors,                   
       lty = 1,                        
       lwd = 2,                     
       bty = "n",
       cex = 0.8) 

Code
ageu25R <- deaths_df%>%
  filter(Subgroup=='Under 25 years')%>%
  select(Maternal_Mortality_Rate)%>%
  rename('Under_25_years'=Maternal_Mortality_Rate)
age25R <- filtered_df%>%
  filter(Subgroup=='25-39 years')%>%
  select(Maternal_Mortality_Rate)%>%
  rename('25-39_years' = Maternal_Mortality_Rate)
ageo40R <- filtered_df%>%
  filter(Subgroup=='40 years and over')%>%
  select(Maternal_Mortality_Rate)%>%
  rename('40_years_and_over'=Maternal_Mortality_Rate)
ageR_ts <- bind_cols(ageu25R,age25R,ageo40R) 
ageR_ts <- ts(ageR_ts, start=c(2019,1), frequency = 12)
colors=c('red','blue','green')
plot.ts(ageR_ts, plot.type='single',main='Monthly Maternal Mortality Rate by Age Group 2019-2024', ylab='Maternal Deaths per 100,000 live Births',xlab='Time',col=colors)
legend("topleft",                     
       legend = c("Under 25 Years", "25-39 Years", "40 Years and over"),  # Labels
       col = colors,                   
       lty = 1,                        
       lwd = 2,                     
       bty = "n",
       cex = 0.8) 

Code
plot_histogram(deaths_df_ethnic$Maternal_Deaths,title='Histogram of Maternal Deaths - Ethnicity')

Code
plot_histogram(deaths_df_age$Maternal_Deaths, title='Histogram of Maternal Deaths - Age')

Code
plot_histogram(total_df$Maternal_Deaths,title='Histogram of Maternal Deaths - Total')

Code
#check overdispersion by for ethnicity data
deaths_df_ethnic %>% summarize(mean(Maternal_Deaths),
var(Maternal_Deaths))
  mean(Maternal_Deaths) var(Maternal_Deaths)
1              191.3046             17438.19
Code
#check overdispersion for age data
deaths_df_age %>% summarize(mean(Maternal_Deaths),var(Maternal_Deaths))
  mean(Maternal_Deaths) var(Maternal_Deaths)
1              276.2407              54102.1
Code
#check overdispersion for total data
total_df %>% summarize(mean(Maternal_Deaths),
var(Maternal_Deaths))
  mean(Maternal_Deaths) var(Maternal_Deaths)
1              828.8889             30819.62
Code
poissonmodel_glmm_ethnicity <- glmer(Maternal_Deaths ~ Ethnicity  + (1 | Month/Year), 
                              offset = log(Live_Births),
                              family = poisson(link = "log"), 
                              data = deaths_df)
poissonmodel_glmm_ethnicity
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: Maternal_Deaths ~ Ethnicity + (1 | Month/Year)
   Data: deaths_df
 Offset: log(Live_Births)
      AIC       BIC    logLik -2*log(L)  df.resid 
 2758.584  2784.557 -1372.292  2744.584       295 
Random effects:
 Groups     Name        Std.Dev.
 Year:Month (Intercept) 0.1892  
 Month      (Intercept) 0.0000  
Number of obs: 302, groups:  Year:Month, 72; Month, 12
Fixed Effects:
                 (Intercept)  EthnicityAsian, Non-Hispanic  
                     -7.2768                       -1.6155  
EthnicityBlack, Non-Hispanic             EthnicityHispanic  
                     -0.3041                       -1.4428  
EthnicityWhite, Non-Hispanic  
                     -1.3299  
optimizer (Nelder_Mead) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings 
Code
jack<-deaths_df2%>%filter(Group != "Total" & Subgroup != "Total")
Code
poissonmodel_glmm_both <- glmer(Maternal_Deaths ~ Subgroup+ (1|Month/Year),
                              offset=log(Live_Births),
                              family = poisson(link = "log"), 
                              data = jack)
poissonmodel_glmm_both
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: Maternal_Deaths ~ Subgroup + (1 | Month/Year)
   Data: jack
 Offset: log(Live_Births)
      AIC       BIC    logLik -2*log(L)  df.resid 
 4606.684  4649.184 -2293.342  4586.684       508 
Random effects:
 Groups     Name        Std.Dev.
 Year:Month (Intercept) 0.1925  
 Month      (Intercept) 0.0000  
Number of obs: 518, groups:  Year:Month, 72; Month, 12
Fixed Effects:
                                           (Intercept)  
                                               -8.4596  
                             Subgroup40 years and over  
                                                1.4269  
SubgroupAmerican Indian or Alaska Native, Non-Hispanic  
                                                1.1756  
                           SubgroupAsian, Non-Hispanic  
                                               -0.4335  
                           SubgroupBlack, Non-Hispanic  
                                                0.8776  
                                      SubgroupHispanic  
                                               -0.2603  
                                SubgroupUnder 25 years  
                                               -0.3922  
                           SubgroupWhite, Non-Hispanic  
                                               -0.1480  
optimizer (Nelder_Mead) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings 
Code
poissonmodel_glmm_ethnicity_nooffset <- glmer(Maternal_Deaths ~ Ethnicity + (1 |Month/ Year), 
                              family = poisson(link = "log"), 
                              data = deaths_df)
poissonmodel_glmm_ethnicity_nooffset
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: Maternal_Deaths ~ Ethnicity + (1 | Month/Year)
   Data: deaths_df
      AIC       BIC    logLik -2*log(L)  df.resid 
 2783.250  2809.223 -1384.625  2769.250       295 
Random effects:
 Groups     Name        Std.Dev. 
 Year:Month (Intercept) 1.881e-01
 Month      (Intercept) 8.126e-05
Number of obs: 302, groups:  Year:Month, 72; Month, 12
Fixed Effects:
                 (Intercept)  EthnicityAsian, Non-Hispanic  
                      2.8909                        0.5271  
EthnicityBlack, Non-Hispanic             EthnicityHispanic  
                      2.6865                        2.1079  
EthnicityWhite, Non-Hispanic  
                      2.9372  
optimizer (Nelder_Mead) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings 
Code
poissonmodel_glmm_both_nooffset <- glmer(Maternal_Deaths ~ Subgroup + (1|Month/Year),#includes ethncity and age
                              family = poisson(link = "log"), 
                              data = jack)
poissonmodel_glmm_both_nooffset
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: Maternal_Deaths ~ Subgroup + (1 | Month/Year)
   Data: jack
      AIC       BIC    logLik -2*log(L)  df.resid 
 4565.783  4608.283 -2272.891  4545.783       508 
Random effects:
 Groups     Name        Std.Dev.
 Year:Month (Intercept) 0.1905  
 Month      (Intercept) 0.0000  
Number of obs: 518, groups:  Year:Month, 72; Month, 12
Fixed Effects:
                                           (Intercept)  
                                                6.3553  
                             Subgroup40 years and over  
                                               -1.5551  
SubgroupAmerican Indian or Alaska Native, Non-Hispanic  
                                               -3.4717  
                           SubgroupAsian, Non-Hispanic  
                                               -2.9378  
                           SubgroupBlack, Non-Hispanic  
                                               -0.7785  
                                      SubgroupHispanic  
                                               -1.3570  
                                SubgroupUnder 25 years  
                                               -1.6026  
                           SubgroupWhite, Non-Hispanic  
                                               -0.5277  
optimizer (Nelder_Mead) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings 
Code
check_overdispersion(poissonmodel_glmm_ethnicity)
# Overdispersion test

       dispersion ratio =   1.386
  Pearson's Chi-Squared = 408.991
                p-value = < 0.001
Code
check_overdispersion(poissonmodel_glmm_ethnicity_nooffset)
# Overdispersion test

       dispersion ratio =   1.454
  Pearson's Chi-Squared = 428.875
                p-value = < 0.001
Code
check_overdispersion(poissonmodel_glmm_both)
# Overdispersion test

       dispersion ratio =   1.299
  Pearson's Chi-Squared = 659.928
                p-value = < 0.001
Code
check_overdispersion(poissonmodel_glmm_both_nooffset)
# Overdispersion test

       dispersion ratio =   1.205
  Pearson's Chi-Squared = 612.172
                p-value =   0.001

##overdispersion still detected

Code
##with offset
ethnicityonly_glmmodel_nb <- glmmTMB(
  Maternal_Deaths ~  Ethnicity + (1 |Month/Year),
  offset=log(Live_Births),
  family = nbinom2,
  data = deaths_df
)
summary(ethnicityonly_glmmodel_nb)
 Family: nbinom2  ( log )
Formula:          Maternal_Deaths ~ Ethnicity + (1 | Month/Year)
Data: deaths_df
 Offset: log(Live_Births)

      AIC       BIC    logLik -2*log(L)  df.resid 
   2711.7    2741.3   -1347.8    2695.7       294 

Random effects:

Conditional model:
 Groups     Name        Variance  Std.Dev. 
 Year:Month (Intercept) 3.528e-02 1.878e-01
 Month      (Intercept) 2.097e-10 1.448e-05
Number of obs: 302, groups:  Year:Month, 72; Month, 12

Dispersion parameter for nbinom2 family ():  217 

Conditional model:
                             Estimate Std. Error z value Pr(>|z|)    
(Intercept)                  -7.26818    0.06197 -117.29  < 2e-16 ***
EthnicityAsian, Non-Hispanic -1.62191    0.06199  -26.16  < 2e-16 ***
EthnicityBlack, Non-Hispanic -0.30679    0.05879   -5.22  1.8e-07 ***
EthnicityHispanic            -1.45856    0.05911  -24.68  < 2e-16 ***
EthnicityWhite, Non-Hispanic -1.33834    0.05869  -22.80  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
both_glmmodel_nb <- glmmTMB(
  Maternal_Deaths ~  Subgroup+  (1 |Month/Year),
  offset=log(Live_Births),
  family = nbinom2,
  data = jack
)
summary(both_glmmodel_nb)
 Family: nbinom2  ( log )
Formula:          Maternal_Deaths ~ Subgroup + (1 | Month/Year)
Data: jack
 Offset: log(Live_Births)

      AIC       BIC    logLik -2*log(L)  df.resid 
   4591.0    4637.7   -2284.5    4569.0       507 

Random effects:

Conditional model:
 Groups     Name        Variance  Std.Dev. 
 Year:Month (Intercept) 3.724e-02 1.930e-01
 Month      (Intercept) 2.118e-10 1.455e-05
Number of obs: 518, groups:  Year:Month, 72; Month, 12

Dispersion parameter for nbinom2 family ():  556 

Conditional model:
                                                       Estimate Std. Error
(Intercept)                                            -8.45935    0.02380
Subgroup40 years and over                               1.42486    0.01369
SubgroupAmerican Indian or Alaska Native, Non-Hispanic  1.17596    0.05558
SubgroupAsian, Non-Hispanic                            -0.43285    0.02284
SubgroupBlack, Non-Hispanic                             0.88117    0.01126
SubgroupHispanic                                       -0.26481    0.01296
SubgroupUnder 25 years                                 -0.39048    0.01388
SubgroupWhite, Non-Hispanic                            -0.14833    0.01072
                                                       z value Pr(>|z|)    
(Intercept)                                             -355.4   <2e-16 ***
Subgroup40 years and over                                104.1   <2e-16 ***
SubgroupAmerican Indian or Alaska Native, Non-Hispanic    21.2   <2e-16 ***
SubgroupAsian, Non-Hispanic                              -18.9   <2e-16 ***
SubgroupBlack, Non-Hispanic                               78.2   <2e-16 ***
SubgroupHispanic                                         -20.4   <2e-16 ***
SubgroupUnder 25 years                                   -28.1   <2e-16 ***
SubgroupWhite, Non-Hispanic                              -13.8   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
###Without offset, Ethnicity as fixed effect, Year as random effect #no offset
ethnicityonly_glmmodel_nb2 <- glmmTMB(
  Maternal_Deaths ~  Ethnicity + (1 |Month/Year),
  family = nbinom2,
  data = deaths_df
)
summary(ethnicityonly_glmmodel_nb2)
 Family: nbinom2  ( log )
Formula:          Maternal_Deaths ~ Ethnicity + (1 | Month/Year)
Data: deaths_df

      AIC       BIC    logLik -2*log(L)  df.resid 
   2732.9    2762.6   -1358.4    2716.9       294 

Random effects:

Conditional model:
 Groups     Name        Variance  Std.Dev. 
 Year:Month (Intercept) 3.366e-02 1.835e-01
 Month      (Intercept) 1.702e-10 1.305e-05
Number of obs: 302, groups:  Year:Month, 72; Month, 12

Dispersion parameter for nbinom2 family ():  193 

Conditional model:
                             Estimate Std. Error z value Pr(>|z|)    
(Intercept)                   2.90488    0.06226   46.66   <2e-16 ***
EthnicityAsian, Non-Hispanic  0.51728    0.06249    8.28   <2e-16 ***
EthnicityBlack, Non-Hispanic  2.67788    0.05935   45.12   <2e-16 ***
EthnicityHispanic             2.08779    0.05966   35.00   <2e-16 ***
EthnicityWhite, Non-Hispanic  2.92331    0.05925   49.34   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
#without offset
both_glmmodel_nb2 <- glmmTMB(
  Maternal_Deaths ~  Subgroup + (1 |Month/Year),
  family = nbinom2,
  data = jack
)
summary(both_glmmodel_nb2)
 Family: nbinom2  ( log )
Formula:          Maternal_Deaths ~ Subgroup + (1 | Month/Year)
Data: jack

      AIC       BIC    logLik -2*log(L)  df.resid 
   4560.7    4607.4   -2269.3    4538.7       507 

Random effects:

Conditional model:
 Groups     Name        Variance  Std.Dev. 
 Year:Month (Intercept) 3.613e-02 1.901e-01
 Month      (Intercept) 2.443e-10 1.563e-05
Number of obs: 518, groups:  Year:Month, 72; Month, 12

Dispersion parameter for nbinom2 family ():  916 

Conditional model:
                                                        Estimate Std. Error
(Intercept)                                             6.355365   0.023262
Subgroup40 years and over                              -1.556384   0.012937
SubgroupAmerican Indian or Alaska Native, Non-Hispanic -3.470165   0.054964
SubgroupAsian, Non-Hispanic                            -2.936926   0.022406
SubgroupBlack, Non-Hispanic                            -0.776350   0.010338
SubgroupHispanic                                       -1.359431   0.012155
SubgroupUnder 25 years                                 -1.601390   0.013143
SubgroupWhite, Non-Hispanic                            -0.528033   0.009746
                                                       z value Pr(>|z|)    
(Intercept)                                             273.21   <2e-16 ***
Subgroup40 years and over                              -120.31   <2e-16 ***
SubgroupAmerican Indian or Alaska Native, Non-Hispanic  -63.13   <2e-16 ***
SubgroupAsian, Non-Hispanic                            -131.08   <2e-16 ***
SubgroupBlack, Non-Hispanic                             -75.10   <2e-16 ***
SubgroupHispanic                                       -111.84   <2e-16 ***
SubgroupUnder 25 years                                 -121.85   <2e-16 ***
SubgroupWhite, Non-Hispanic                             -54.18   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
glmmodel_nb3 <- glmmTMB(
  Maternal_Deaths ~  Ethnicity +(1|Month/Year),
  family = nbinom2,
  data = deaths_df
)
summary(glmmodel_nb3)
 Family: nbinom2  ( log )
Formula:          Maternal_Deaths ~ Ethnicity + (1 | Month/Year)
Data: deaths_df

      AIC       BIC    logLik -2*log(L)  df.resid 
   2732.9    2762.6   -1358.4    2716.9       294 

Random effects:

Conditional model:
 Groups     Name        Variance  Std.Dev. 
 Year:Month (Intercept) 3.366e-02 1.835e-01
 Month      (Intercept) 1.702e-10 1.305e-05
Number of obs: 302, groups:  Year:Month, 72; Month, 12

Dispersion parameter for nbinom2 family ():  193 

Conditional model:
                             Estimate Std. Error z value Pr(>|z|)    
(Intercept)                   2.90488    0.06226   46.66   <2e-16 ***
EthnicityAsian, Non-Hispanic  0.51728    0.06249    8.28   <2e-16 ***
EthnicityBlack, Non-Hispanic  2.67788    0.05935   45.12   <2e-16 ***
EthnicityHispanic             2.08779    0.05966   35.00   <2e-16 ***
EthnicityWhite, Non-Hispanic  2.92331    0.05925   49.34   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
performance::check_model(ethnicityonly_glmmodel_nb)

Code
performance::check_model(both_glmmodel_nb)

Code
performance::check_model(ethnicityonly_glmmodel_nb2)

Code
performance::check_model(both_glmmodel_nb2)

Code
#for non-normal error models
sim <- simulateResiduals(ethnicityonly_glmmodel_nb)
plot(sim)

Code
sim2 <- simulateResiduals(ethnicityonly_glmmodel_nb2)
plot(sim2)

Code
sim3 <- simulateResiduals(both_glmmodel_nb)
plot(sim3)

Code
sim4 <- simulateResiduals(both_glmmodel_nb2)
plot(sim4)

Code
sjPlot::tab_model(ethnicityonly_glmmodel_nb)
  Maternal Deaths
Predictors Incidence Rate Ratios CI p
(Intercept) 0.00 0.00 – 0.00 <0.001
Ethnicity [Asian,
Non-Hispanic]
0.20 0.17 – 0.22 <0.001
Ethnicity [Black,
Non-Hispanic]
0.74 0.66 – 0.83 <0.001
Ethnicity [Hispanic] 0.23 0.21 – 0.26 <0.001
Ethnicity [White,
Non-Hispanic]
0.26 0.23 – 0.29 <0.001
Random Effects
σ2
τ00 Year:Month 0.04
τ00 Month 0.00
N Year 6
N Month 12
Observations 302
Code
sjPlot::tab_model(ethnicityonly_glmmodel_nb2)#no offset
  Maternal Deaths
Predictors Incidence Rate Ratios CI p
(Intercept) 18.26 16.17 – 20.63 <0.001
Ethnicity [Asian,
Non-Hispanic]
1.68 1.48 – 1.90 <0.001
Ethnicity [Black,
Non-Hispanic]
14.55 12.96 – 16.35 <0.001
Ethnicity [Hispanic] 8.07 7.18 – 9.07 <0.001
Ethnicity [White,
Non-Hispanic]
18.60 16.56 – 20.89 <0.001
Random Effects
σ2
τ00 Year:Month 0.03
τ00 Month 0.00
N Year 6
N Month 12
Observations 302
Code
sjPlot::tab_model(both_glmmodel_nb)
  Maternal Deaths
Predictors Incidence Rate Ratios CI p
(Intercept) 0.00 0.00 – 0.00 <0.001
Subgroup [40 years and
over]
4.16 4.05 – 4.27 <0.001
Subgroup [American Indian
or Alaska Native,
Non-Hispanic]
3.24 2.91 – 3.61 <0.001
Subgroup [Asian,
Non-Hispanic]
0.65 0.62 – 0.68 <0.001
Subgroup [Black,
Non-Hispanic]
2.41 2.36 – 2.47 <0.001
Subgroup [Hispanic] 0.77 0.75 – 0.79 <0.001
Subgroup [Under 25 years] 0.68 0.66 – 0.70 <0.001
Subgroup [White,
Non-Hispanic]
0.86 0.84 – 0.88 <0.001
Random Effects
σ2
τ00 Year:Month 0.04
τ00 Month 0.00
N Year 6
N Month 12
Observations 518
Code
sjPlot::tab_model(both_glmmodel_nb2)#no offset
  Maternal Deaths
Predictors Incidence Rate Ratios CI p
(Intercept) 575.57 549.92 – 602.42 <0.001
Subgroup [40 years and
over]
0.21 0.21 – 0.22 <0.001
Subgroup [American Indian
or Alaska Native,
Non-Hispanic]
0.03 0.03 – 0.03 <0.001
Subgroup [Asian,
Non-Hispanic]
0.05 0.05 – 0.06 <0.001
Subgroup [Black,
Non-Hispanic]
0.46 0.45 – 0.47 <0.001
Subgroup [Hispanic] 0.26 0.25 – 0.26 <0.001
Subgroup [Under 25 years] 0.20 0.20 – 0.21 <0.001
Subgroup [White,
Non-Hispanic]
0.59 0.58 – 0.60 <0.001
Random Effects
σ2
τ00 Year:Month 0.04
τ00 Month 0.00
N Year 6
N Month 12
Observations 518
Code
sjPlot::tab_model(glmmodel_nb3)#no offset
  Maternal Deaths
Predictors Incidence Rate Ratios CI p
(Intercept) 18.26 16.17 – 20.63 <0.001
Ethnicity [Asian,
Non-Hispanic]
1.68 1.48 – 1.90 <0.001
Ethnicity [Black,
Non-Hispanic]
14.55 12.96 – 16.35 <0.001
Ethnicity [Hispanic] 8.07 7.18 – 9.07 <0.001
Ethnicity [White,
Non-Hispanic]
18.60 16.56 – 20.89 <0.001
Random Effects
σ2
τ00 Year:Month 0.03
τ00 Month 0.00
N Year 6
N Month 12
Observations 302

Modeling and Results

  • Explain your data preprocessing and cleaning steps.

  • Present your key findings in a clear and concise manner.

  • Use visuals to support your claims.

  • Tell a story about what the data reveals.

Conclusion

  • Summarize your key findings.

  • Discuss the implications of your results.

References

Agresti, A. 2015. Foundations of Linear and Generalized Linear Models. Wiley Series in Probability and Statistics. Wiley. https://books.google.com/books?id=jlIqBgAAQBAJ.
Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.
Candy, Steven G. 2000. “The Application of Generalized Linear Mixed Models to Multi-Level Sampling for Insect Population Monitoring.” Environmental and Ecological Statistics 7 (3): 217–38.
Lee, Youngjo, and John A Nelder. 1996. “Hierarchical Generalized Linear Models.” Journal of the Royal Statistical Society Series B: Statistical Methodology 58 (4): 619–56.
McGillycuddy, Maeve, David I. Warton, Gordana Popovic, and Benjamin M. Bolker. 2025. “Parsimoniously Fitting Large Multivariate Random Effects in glmmTMB.” Journal of Statistical Software 112 (1): 1–19. https://doi.org/10.18637/jss.v112.i01.
Owili, Patrick Opiyo, Tang-Huang Lin, Miriam Adoyo Muga, and Wei-Hung Lien. 2020. “Impacts of Discriminated PM2. 5 on Global Under-Five and Maternal Mortality.” Scientific Reports 10 (1): 17654.
R Core Team. 2025. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Salinas Ruı́z, Josafhat, Osval Antonio Montesinos López, Gabriela Hernández Ramı́rez, and Jose Crossa Hiriart. 2023. Generalized Linear Mixed Models with Applications in Agriculture and Biology. Springer Nature.
Tawiah, Kassim, Samuel Iddi, and Anani Lotsi. 2020. “On Zero-Inflated Hierarchical Poisson Models with Application to Maternal Mortality Data.” International Journal of Mathematics and Mathematical Sciences 2020 (1): 1407320.
Thall, Peter F. 1988. “Mixed Poisson Likelihood Regression Models for Longitudinal Interval Count Data.” Biometrics, 197–209.
Thompson, Jennifer A, Clemence Leyrat, Katherine L Fielding, and Richard J Hayes. 2022. “Cluster Randomised Trials with a Binary Outcome and a Small Number of Clusters: Comparison of Individual and Cluster Level Analysis Method.” BMC Medical Research Methodology 22 (1): 222.
Wang, Ke-Sheng, Xuefeng Liu, Muyiwa Ategbole, Xin Xie, Ying Liu, Chun Xu, Changchun Xie, and Zhanxin Sha. 2017. “Generalized Linear Mixed Model Analysis of Urban-Rural Differences in Social and Behavioral Factors for Colorectal Cancer Screening.” Asian Pacific Journal of Cancer Prevention: APJCP 18 (9): 2581.
Zuur, Alain F, Elena N Ieno, Neil J Walker, Anatoly A Saveliev, Graham M Smith, et al. 2009. Mixed Effects Models and Extensions in Ecology with r. Vol. 574. Springer.